Chosen Plaintext Attack (CPA)

A chosen plaintext attack (CPA) models the scenario where an adversary can choose arbitrary plaintexts and obtain their corresponding ciphertexts that are all generated by encrypting the messages with the same secret key . The adversary's goal is to then decrypt a ciphertext that was obtained by encrypting an unknown message , also with the secret key .

Example

In World War 2, the British would place mines at specific locations and when the Germans found them, they would encrypt their locations and send them to their superiors. The intercepted encrypted messages would later be used at Bletchley Park to break the encryption scheme of the Germans.

This scenario gives the adversary (partial) control over the messages and ciphertexts it has access to and one can imagine this as the attacker being able to influence to some extent the messages that are exchanged by the two authentic parties Alice and Bob.

Note

It is imperative to remember that in the CPA model, all messages are encrypted using the same key.

CPA-Security

So what does it mean for an encryption scheme to be secure under the chosen plaintext thread model?

Definition: CPA-Security

The efficient adversary is given oracle access to the encryption function for some random secret key and queries it with messages to obtain their respective ciphertexts . The cipher is CPA-secure if for any two messages and ciphertext belonging to either or , the adversary still cannot guess with probability non-negligibly greater than whether is the encryption of or , i.e.

Definition Breakdown

As previously mentioned, the adversary has oracle access to and can thus obtain plaintext-ciphertext pairs . They then attempt to guess whether a given ciphertext belongs to a message or (the adversary of course also knows and ). The word "any" in the definition entails that Eve is even free to choose and herself. The cipher is considered CPA-secure if even with all this information, the adversary cannot guess with success marginally better than if the ciphertext corresponds to or .

At first glance, there appears to be something wrong with this definition. The adversary Eve is free to choose both as well as and . Therefore, it seems that this definition can be trivially broken by Eve simply by choosing to be the same as one of the previously queried messages . When Eve is presented with a ciphertext at the end, she can just check if is the ciphertext she obtained when querying - since , she will know with 100% certainty whether the ciphertext is the encryption of or . This leads to the following consequence for all CPA-secure ciphers.

Necessity of Randomness

There is no CPA-secure cipher with a deterministic encryption function .

If is probabilistic, i.e. it uses internal randomness, then the same message will produce different ciphertexts each time it is encrypted which kills the aforementioned breaking technique stone-dead. It might seem weird that the same message can produce different ciphertexts at first, but this is actually fairly easy to implement. The internal randomness used in each encryption can be encoded in the ciphertext is such a way that it can be recovered later if one knows the secret key .

This property of CPA-security means that it is a stronger notion than semantic security - every CPA-secure cipher is also semantically secure, but the opposite is not necessarily true. In fact, CPA-security is nowadays the bare minimum definition which is expected to be satisfied by a cipher in order to be considered usable, since it provides security in the case of key reuse.

Theoretical Implementation

As with many things in cryptography, pseudorandom function generators (PRFGs) come to the rescue when trying to implement a CPA-secure cipher.

Note

This is just a proof-of-concept and the following cipher is not used in practice.

Suppose we have a pseudorandom function generator . The encryption function is first going to generate a random string of length . It will then seed the PRFG with the key (which also has length ) and it will pass to it. The output of the PRFG will be XOR-ed with the message. Finally, will prepend to this XOR-ed value:

#![allow(unused)]
fn main() {
fn Enc(key: str[n], message: str[l]) -> str[n + l] 
{
	let r = random_binary_string(length: n);
	return r + (xor(PRFG(key, r), message));
}
}

The decryption function takes the ciphertext of length and parses it as two strings - a string of length and a string of length . It then seeds the PRFG with the key and passes it the string . The output of the PRFG is then XOR-ed with to obtain the original message.

#![allow(unused)]
fn main() {
fn Dec((key: str[n], ciphertext: str[n + l])) -> str[l] 
{
	let r = ciphertext[0..n];
	let z = ciphertext[n..];
	return xor(PRFG(key, r), z);
}
}

Indeed, this is a valid encryption scheme - every ciphertext can only be mapped to one plaintext.

Proof: Validity

Given a key , the encryption of a message is

Let's see the decryption of this output for the same key :

Therefore, the validity condition is satisfied.

Moreover, this construction has a probabilistic encryption function and also turns out to be CPA-secure.

Proof: CPA-Security

Suppose we follow the CPA model and the adversary Eve obtains the ciphertexts of the messages . For the -th encryption a random string of length is generated and each message is also encrypted with the same key (as per the definition of CPA-security).

Each of these strings is generated randomly, so the probability that the last string is the same as one of the previous random strings is , which is negligible.

Suppose, towards contradiction that Eve could break the CPA-security of the cipher with probability for some non-negligible . If instead of a PRFG the encryption used a truly random function , then the probability that Eve could distinguish between and would be strictly because she would simply lack any additional information. However, the encryption does use a PRFG and if Eve can distinguish between an encryption of and an encryption of with probability non-negligibly greater than , then that means that she can distinguish between the output of a PRFG and that of a truly random function with non-neglgible advantage over , which is a contradiction.